Running head : CONSTRUCTION OF A RATED L 2 SPEECH CORPUS

نویسندگان

  • Su-Youn Yoon
  • Lisa Pierce
  • Amanda Huensch
  • Eric Juul
  • Samantha Perkins
  • Richard Sproat
  • Mark Hasegawa-Johnson
چکیده

This work reports on the construction of a rated database of spontaneous speech produced by second language (L2) learners of English. Spontaneous speech was collected from 28 L2 speakers representing six language backgrounds and five different proficiency levels. Speech was elicited using formats similar to that of the TOEFL iBT and the SPEAK (Speaking Proficiency English Assessment Kit) test. A total of 182 minutes of spontaneous speech were collected, segmented and assessed by two phonetically trained, experienced ESL instructors. The raters assigned a general fluency score and phone accuracy score with additional detailed comments on pronunciation errors. This database was designed with several applications in mind: the development of computer aided pronunciation and fluency training: automatic assessment of fluency and pronunciation; as a tool for researchers working in automatic speech recognition and for linguists, more generally. This database will be released to the public in the near future. Key-Words: rated speech corpus, L2, automated scoring L2 rated speech corpus 2 Introduction This study reports on the construction of a rated, spontaneous speech database of second language (L2) learners of English. The purpose of such a rated speech corpus is to aid in the development of automatic speech fluency assessment and computer aided pronunciation training (CAPT). The rated speech database will be used for the training and evaluation of such systems. It is generally acknowledged that a rated speech corpus is necessary for the development of such tools and many such efforts are reported in the relevant literature. For example, Witt (1998), Kim, Franco and Neumeyer (1997) and Bratt, Neumeyer, Shriberg and Franco (1998) collected non-native speakers' read speech and the accuracy of each phone was scored by trained raters. However, both databases were constructed from read speech. As such, it is impossible to analyze the nature of spontaneous speech. Spontaneous speech, for both L1 and L2 speakers, is complex in nature. It is characterized by pauses, filled pauses, hesitations, increased assimilation both within and across word boundaries, environmentally determined alternations as well as lenition and fortition phenomena predictable from higher level prosodic structures. Recently, the Center for Spoken Language Understanding (CSLU) released the Foreign Accented English database. This database contains 4,925 spontaneous speech samples in English spoken by non-native speakers from 22 different native languages. Each speech file includes about 20 seconds of self-introduction. Three native speakers rated the accentedness of the sound files using a 4-point scale with 0 indicating "no accent" and 4 indicating a "very strong accent." Clearly such a database is a valuable resource for those researchers and scholars developing automated assessment systems for overall speech fluency. However, this database does not include an accuracy score for each phone, which would be useful for the research related to L2 L2 rated speech corpus 3 learners' pronunciation in spontaneous speech such as acquisition of L2 phoneme and its actual use. The database reported on here is constructed from spontaneous speech produced by L2English learners. It was designed specifically for training and evaluating fluency and pronunciation in the context of spontaneous speech. The speech samples were recorded using an elicitation format similar to those used in the TOEFL iBT and the SPEAK test – both of which are fluency assessment tools. The database includes a general fluency score – again based on the TOEFL assessment rubric and a phone accuracy score. All scoring was done by raters who are both experienced ESL teachers and linguistically trained phoneticians. The database includes L2 speakers from five different language backgrounds and at different fluency levels (from beginner to advanced). It is annotated with raters' holistic fluency scores, scores for each phone, a transcription of both the target phone and any substituted phones, as well as detailed comments on the nature of any pronunciation errors. Given the level of annotation detail, it is anticipated that this corpus will be an excellent resource for researchers studying the spontaneous speech of L2 learners, for educators, for professionals in educational testing and assessment, and for researchers working in automatic speech recognition technology. Construction of the annotated spontaneous speech database Participants 28 non-native speakers of English were recorded in the phonetics lab at the University of Illinois Urbana-Champaign. Of the participating students, 22 were recruited from intermediate and advanced level pronunciation classes at the Intensive English Institute (IEI) at the University of Illinois. Six participants were graduate students in the Linguistics department at the University L2 rated speech corpus 4 of Illinois at Urbana-Champaign. The number of students from each language group and background information are provided in table 1 and 2 below. Details of the rating methods/procedures are provided in the section titled “Rating”. Table 1. Native Languages of Speakers Language Korean Chinese Spanish Other Number of Speakers 14 8 3 3 Table 2. Background Information of Speakers Mean Range Age 27.7 18~34 length of residence in US 1.3 years 1 month ~ 6 years Age at start of English instruction 13.6 10 ~ 31 Asian students represented about 80% of the speaker population; 50% were Korean and 28% were Chinese. Other represented groups included Arabic and Turkish (10%). The two groups (students from the IEI and the graduate students from the linguistic department) were differently distributed in age and length of residence (LOR) in the US. The mean age of the IEI students was 26.4, while the mean age of graduate students was 31.6. The mean LOR of the IEI students was 6.4 months: the mean LOR of graduate students was 3.8 years. The age of onset of English instruction was similar across students, with an average of 13.6. L2 rated speech corpus 5 Material and procedures The speech was recorded in a sound attenuated booth in the phonetics laboratory at the University of Illinois at Urbana-Champaign. The speech data were collected using prompts that were composed of 8 questions: two questions required the participants to describe a movie that they liked or a country they wished to visit. Two questions were picture description tasks and two questions required the learners to provide an opinion about a social issue (after reading a short passage). Finally, there were two prompts that required the participants to give directions (after reading a map). The questions were presented in a PowerPoint presentation on a computer screen. Participants were given 30 to 60 seconds (depending on the prompt) to prepare and 30 – 60 seconds to respond. An electronic beep signaled when they were to begin and end speaking. The allocated response time was tracked on the computer screen and was automatically reset at the end of either the response or the preparation time. In total, each speaker provided a 6.5 minute speech sample. The frequency of each phoneme is important in automatic pronunciation assessment. In order to detect segmental pronunciation errors reliably, each phoneme should occur with a reasonable minimum frequency. In assessments using read speech, this is less problematic since it is possible to use sentences balanced for the distribution and frequency of phonemes. Obviously, the frequency of individual phonemes is less controllable in spontaneous speech samples. In order to address this, pronunciation error patterns predictable from differences between the L1 and L2 phonological systems, were collected from Swan & Smith (2002). From their study, English phonemes that cause the greatest difficulty for L2 learners whose native L2 rated speech corpus 6 languages are Korean, Chinese, and Spanish were identified, and these phonemes were included in the map task prompt. Description of the database Transcription and statistics The speech data were transcribed at the word level by two linguistics students. Word fragments, filled pauses, and silent pauses longer than 0.2 second were included in the transcription. Unintelligible words were treated as unknown words. From the transcription, the distribution of the words and phonemes were analyzed. The speakers spoke 98.13 words per minute on average, with the fastest speaker producing 947 word tokens – twice as many as the slowest speaker, who produced 474 word tokens. However, there were fewer differences in word types used among the speakers; the speaker with the greatest diversity in word types used 290 different word tokens, while the least diverse speaker used 197 word tokens.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Construction of a Rated Speech Corpus of L2 Learners’ Spontaneous Speech

This work reports on the construction of a rated database of spontaneous speech produced by second language (L2) learners of English. Spontaneous speech was collected from 28 L2 speakers representing six language backgrounds and five different proficiency levels. Speech was elicited using formats similar to that of the TOEFL iBT and the Speaking Proficiency English Assessment Kit (SPEAK) test. ...

متن کامل

NKI-CCRT Corpus - Speech Intelligibility Before and After Advanced Head and Neck Cancer Treated with Concomitant Chemoradiotherapy

Evaluations of speech intelligibility based on a read passage are often used in the clinical situation to assess the impact of the disease and/or treatment on spoken communication. Although scale-based measures are often used in the clinical setting, these measures are susceptible to listener response bias. Automatic evaluation tools are being developed in response to some of the drawbacks of p...

متن کامل

The Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners

This study investigated the effectiveness of colligational corpus-based instruction on enhancing the pragmalinguistic knowledge of speech act of request among Iranian intermediate EFL learners. The objective of the study was to find out whether or not providing students with corpora through using colligational instruction had any significant effects on enhancing their pragmalinguistic knowledge...

متن کامل

The Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners

This study investigated the effectiveness of colligational corpus-based instruction on enhancing the pragmalinguistic knowledge of speech act of request among Iranian intermediate EFL learners. The objective of the study was to find out whether or not providing students with corpora through using colligational instruction had any significant effects on enhancing their pragmalinguistic knowledge...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009